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We consider neutral evolution of a large population subject to changes in its population size. For a population 
with a time-variable carrying capacity we have computed the distributions of the total branch lengths of its 
sample genealogies. Within the coalescent approximation we have obtained a general expression - Eq. (27) - 
for the moments of these distributions for an arbitrary smooth dependence of the population size on time. We 
investigate how the frequency of population-size variations alters the distributions. This allows us to discuss their 
influence on the distribution of the number of mutations, and on the population homozygosity in populations 
with variable size. 
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[. INTRODUCTION 

Models for gene genealogies of biolo gical populations often assume a constant, time-indepe ndent population size N. This 



i gical populations often assume a constant, time-indepe ndent population i 
is the case for the Wright-Fisher model (IFisherL Il930l WRlGHli Il93ll) . for the Moran model dMORANl[l958h . and for their 



representation in terms of the coalescent (IKINGMANI 119821) . In real biological populations, by contrast, the population size 
changes over time. Such fluctuations may be due to catastrophic events (bottlenecks) and subsequent population expansions, or 
just reflect the randomness in the factors determining the population dynamics. Many authors have argued that genetic variation 
in a population subject to size fluctuations may nevertheless be described by the Wright-Fisher model, if one replaces the constant 
population size in this model by an effective population size of the form 

N eS = ( lim 1 [ T ^] (1) 



t^TJo N(t 



(see, e.g., EwensI (19821) for a review of different measures of the effective population size, and ISJODIN et al\ d2005l) : 



IWAKELEY and SargsyanI d2009l) for recent extensions of this concept). The harmonic average in Eq. (Q]) is argued to cap- 
ture the significant effect of catastrophic events on patterns of genetic variation in a population: if for example a population went 
through a a recent bottleneck, a large fraction of individuals in a given sample would originate from few parents. This in turn 
would lead to significantly reduced genetic variation, parameterised by a small value of N e s ■ 

The concept of an effective population size has been frequently used in the literature, implicitly assuming that the distribution 
of neutral mutations in a large population of fluctuating size is identical to the distribution in a Wright-Fisher model with the 
corresponding constan t effective population size given by Eq. CP- However, recently it has been shown that this i s true only under 
certain circumstance s djAGERS and SagitovI |2004 IKaj and KroneL 120031: INORDBORG and KroneI 120031) . It is argued by 
ISJODIN et al\ d2005b that the concept of an effective population size is appropriate when the time scale of fluctuations of N(t) is 
either much smaller or much larger than the typical time between coalescent events in the sample genealogy. In these limits it 
can be proven that the distribution of the sample genealogies is exactly given by that of the coalescent with a constant, effective 
population size. 

More importantly, it follows from these results that, in populations with variable size, the coalescent with a constant effective 
population size is not always a valid approximation for the sample genealogies. Deviations between the predictions of the stan- 
dard coalescent model and empirical data are frequ ently observed, and there is a number of different sta tistical tests quantifying 
the corresponding discrepancies (see for example dFu and Lj . 1 19931; ITajimaL Il989t IZENG ef^Zll2006|)). The analysis of such 
deviations is of crucial importance in understanding for example human genetic history (IGARRIGAN and HammerL|2006|) . But 
while there is a substantial amount of work numerically quantifying deviations, often in terms of a single number, little is known 
about their qualitative origins and their effect upon summary statistics in the population in question. 

The aim of this paper is to study the effect of population-size fluctuations on the patterns of genetic variation for the case 
where the scale of the population-size fluctuations is comparable to the time between coalescent events in the ancestral tree. 
As is well-known, empirical measures of genetic variation can usually be computed from the total branch length of the sample 
genealogy (the expected number of single-nucleotide polymorphisms, for example, is proportional to the average total branch 
length). In the following we therefore analyse the distributions of the total branch lengths for sample genealogies in a population 
of fluctuating size. An example is given in Fig. Q]which shows numerically computed branch-length distributions for a particular 
model population (described in Sec. |IV| ) with a time-dependent carrying capacity. 

As Fig. Q] shows, the distributions depend in a complex manner on the form of the size changes. We observe that when 
the frequency of the population-size fluctuations is either very small or very large, the results are well described by Kingman's 
coalescent with a constant (effective) population size. Apart from these special limits, however, the form of the distributions 
appears to depend in a complicated manner upon the frequency of the population-size variation. The observed behaviour is 
caused by the fact that coalescence proceeds faster for smaller population sizes, and more slowly for larger population sizes, as 
illustrated in Fig. [2] But the question is how to quantitatively account for the changes displayed in Fig. Q] 

We show in this paper that the results of the simulations shown in Fig. Q]are explained by a general expression - Eq. ( TZTT i - 
for the moments of the distributions shown in Fig.Q] Our general result is obtained within the coalescent approximation valid in 
the limit of large population size. But we find that in most cases, the coalescent approximation works very well down to small 
population sizes (a few hundreds of individuals). Our result enables us to understand and quantitatively describe the frequency 
dependencies of the distributions shown in Fig.[T] It makes possible to determine for example how the variance, skewness, and 
the kurtosis of these distributions depend upon the frequency of demographic fluctuations. This in turn allows us to compute the 
population homozygosity and to characterise genetic variation in populations with size fluctuations. 

The remainder of this paper is organised as follows. In Sec.[II]we review how empirical observables are related to the branch 
lengths of the sample genealogies. Section [III] summarises our analytical results for the moments of the total branch length. In 
Sec. [IV] we describe the model employed in the computer simulations. The corresponding numerical results are compared to the 
analytical predictions in Sec. [V] Finally, in Sec. [VI] we summarise how population-size fluctuations influence the distribution of 
total branch lengths, discuss the implications for patterns of genetic variation, and conclude with an outlook. 
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FIG. 1 Numerically computed distributions p(T„) of total branch lengths T„ in genealogies of samples of size n = 10. The model employed 
in the simulations is outlined in Sec. lIVI It describes a population subject to a time-varying carrying capacity. Panels a to e show p(T„) for 
populations with increasingly rapidly oscillating carrying capacity. The dashed red line in a shows a low-frequency approximation to p(T„) 
obtained for a constant carrying capacity. The dashed red lines in d and e show the large-frequency approximations given by Eq. l |49b . Further 
numerical and analytical results on the frequency dependence of the moments of these distributions are shown in Fig. [5] Parameter values (see 
Sec.|lV]for details): K = 10,000, r = 1, £ = 0.9, and a vK = 0.001, b vK = 0.1, c vK = 0.316, d vK = 1, and e vK = 100. 



II. OBSERVABLES 

In this section we review how empirical observables are related to the branch lengths of the sample genealogies. 

Patterns of genetic variation reflect the gene genealogy corresponding to a given sample. Within a neutral infinite-sites model, 
mutations are assumed to occur randomly at a constant rate jj. on the genealogy. For a sample of size «, the number S„ of 
single-nucleotide polymorphisms conditioned on the total branch length T„ of the sample genealogy has a Poisson distribution 
with mean 0T„/2 (here 9 is a scaled mutation parameter, 6 = 2jj.No where A^o is a suitable measure of the population size): 

(Sn) = ~ (T„) ■ (2) 

Similarly, moments of S n can be computed in terms of moments of T„. As is well known, the corresponding relations are most 
conveniently expressed in terms of the function F n (q) from which the moments can be computed by repeated differentiation with 
respect to q: 

F n (q) = so that (r*) = (-l)*^f n (0) . (3) 

Note that F n (6 /2) is the probability of observing no mutations in a sample of size n (thus F2(9/2) is the population homozy- 
gosity). 
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FIG. 2 Illustrates the effect of population-size oscillations on the genealogy of a sample of size n = 17 (schematic). Left: genealogy described 
by Kingman's coalescent for a large population of constant size, illustrated by the light blue rectangle. Right: sinusoidally varying population 
size. Coalescence is accelerated in regions of small population sizes, and vice versa. This significantly alters the tree and gives rise to changes 
in the distribution of the number of mutations, and of the population homozygosity. 



The corresponding function for the moments of S„ is found to be: 



e 



F„Gr(l-e-*)). 



(4) 



For a constant population size, this equation is equivalent to Eq. (1.3a) in dWATTERSONl[l975l) . 

In short, the distribution of single-nucleotide polymorphisms is determined by the function F„(q), or equivalently by the 
moments (T,f). 

Microsatellite loci by contrast are usually modeled in terms of a step-wise mutation model (lOHTA and KimuraL 1 19731) in 
which a mutation corresponds to either the gain or, equally likely, the loss of a repeat unit. Provided that such steps (mutations) 
occur according to a Poisson process, the distributi on of the difference / in the numbers of repeats between two r andomly 
sampled sequences is determined by the function F 2 dKlMMEL and ChakrabortyLI19961:[Ohta and Kimura[[1973"1) : 



1 f 2n 

Pj = ^ z J { dco CO&((Qj)F2(0{l -cos©)) 



(5) 



In summary, the function F„ (or equivalently the moments (T*)) allow to compute the statistical fluctuations of the numbers 
of single-nucleotide polymorphisms and of the number of steps in a step- wise mutation model. In Sec. HID we show how the 
moments and the function F„(q) may be determined for a large neutral population subject to smooth population-size changes of 
otherwise arbitrary form. 

III. COALESCENT APPROXIMATION FORMULAE FOR F n (q) AND (2*) 

In this section we show how to calculate the function F„(q) and the moments (7™) within the coalescent approximation, for a 
population with a smoothly varying size. 

For q = 9/2, the quantity F n -\(6/2) is just the probability that n— 1 sequences sampled at the present time are identical. 
Thus in a population of constant size, F„ (q) is given by 



( 2 )+nq 

This recursion has the well-known solution (with initial condition F\ = \) 

Y{n)Y{\+2q) 



Fn(q) = 



T{n + 2q) 



(6) 



(7) 
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FIG. 3 a Illustrates the definition of the variables %j and sj = Y^=j usec ' m tne calculation of F n (q). In the example given, the sample size 
is n = 5. b Illustrates the definition of the times in Eq. d2 lb . 



The question is how to obtain a corresponding expression for the case of a changing population size. We assume that in 
the limit of large population sizes, the changes in population size are described by a smooth curve x{t) — N(t)/No, so that 
< x(t) < 1 and x(0) = xq at the present time, t = 0. As common in the coalescent approximation time is counted 'backwards', 
that is t increases from the present (t = 0) to the past. 

Given a realisation of the curve x(t ), the functio n F„ (q) can be calculated as follo ws. The starting point is the joint distribution 
of times Ty (illustrated in Fig.[3k). As shown by IGRIFFITHS and TavareI dl994l) it can be written in terms the variables Sj = 

/(t 2 ,...,t„) = fJ^f'e-'iW-^')] . (8) 

7=2 

Here b j = j(j - l)/2 and A(f) = f At'x(t'y x is the 'population-size intensity function' defined by IGRIFFITHS and TavareI 
(Il994l) . The distribution of the times T, during which the sample genealogy has j lines depends upon the sample size n. This 
dependence is not made exp licit here, neither in Eq. (HJ nor in the following. The corresponding joint density for the variables 
Sj is simply (ITavareI[2004 

g(S2, ■ ■ ;Sn) = f\b jx{s j)'' ^AH*])-^)] (9) 
7=2 

(for < s„ < s n -\ < ■■■ < S2, and s n+ i = 0). 

Now we make use of the fact that the total time is given by 

T„ = s„ +i„_i + . . . +53 +2s 2 . (10) 

(see Fig. |3^). The function F„(q) can therefore be written as 

F„(q) - / d5„...d5 2 ^(5 2 ,...,5, I )e-^" +i "-'+-+^+ 2s 2). (ii) 

Expanding the multiple integrals one obtains 

F t q ) = b„ I" i^-e-K"- 1 ^''^^] ■■■b 2 r ifL e -[ A fe)+2^1 . (12) 
Jo x(s„) J S3 x(s2) 

For small sample sizes n, Eq. (fT2l provides a convenient way of computing the function F n (q) and the corresponding moments 
(T*). For example, for n = 2 one finds simply 

F 2 (q) = l-2« Hebe-*®- 2 *, (13) 
Jo 

(Tf) = 2 k k [°°dt t k - l e- A ®. (14) 
Jo 

This makes it possible to compute the population homozygosity in large populations with arbitrary size variations, as well as the 
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distribution of steps in a step- wise mutation model, according to Eq. (0. 

For large values of n, by contrast, the large number of nested integrals in $1% becomes increasingly difficult to evaluate. In 
this limit, however, the distribution is conveniently characterised in terms of its cumulants which can be expressed in terms of 
the moments (Tjf). 

In the remainder of this section we show how to calculate the moments for arbitrary sample sizes n. According to Eq. ©, these 
moments are obtained by repeated differentiation of Eq. (fT2l i. However, in the following we describe a more elegant approach 
making use of a result obtained by ITavarH d 19841) . As Fig. [3^ shows, an alternative expression for the total time is simply 
T„ = Y!m=2 mT »i- Th e ^ _tn moment of the distribution of T„ is therefore 

(T n k )= V * \ v "-2^W-T?) (15) 



v 2 ,v 3 v„ 

V2+V3H hV„=* 



V2,V 3 , 



where the variables V; can assume values between and k (subject to the constraint V2 + V3 H h v„ = k). In a population of 

constant size, x = 1, the varia bles T; are independent and their correlation functions factorise. In general this is not the case: 
IZlVKOVlC and WieheI d2008l) . for example, have calculated (t,-T/) for a smoothly varying population (Eqs. (2) and (3) in their 
paper). 

In the following we show how the correlation functions of arbitrary order appearing in ( TT3T > can be calculated in a very simple 
manner. Consider first the case k = 1 . We have 

V = J ■ ( 16 > 

Here £(t) denotes the number of lines for a particular realisation of the coalescent process at time t in a sample of size n = £(Q). 
The indicator function in Eq. ( fT6] l is unity when i(t) = j and zero otherwise. Averaging over realisations gives 



(Tj)= J~dt(l {Ht)=j} )= J~dtf nj (0,t 



(17) 



Here f n m(h,t2) is the conditional probability that n ancestral lines at t\ coalesce to m lines at time t% >t\. 

For a constant population size (x = 1), the coalescent is invariant under time translations, f n m (t] J?) = s n m tii — t\)H(t2 — h). 
Here H(t) = 1 if t > and zero otherwise. The conditional probability gnm{t) was derived by ITAVARH dl984l) . For m>2 the 
result is: 

g nm {t) = f i c nm p- b i t (18) 

j=m 

= t iv-"' 2j - 1 T ^ m +j- 1 ) r (») Hn + l) 

nmj [ ' m\(j - m) ! r(m) Y{n + j)T(n- j +1) ' 1 ' 

In the general case of a variable population size, as shown by IGRIFFITHS and TavareI i 19941) . the conditional probability 
depends only on the intensity A(?2) — A(fj) during the time-interval [fi,?2]: 

fnm(h,t 2 ) =g„ m (A(t2)-A{h)) . (20) 

Now consider the case k = 2. For ; > j we have simply 

XiXj = I dt!l m)=tl J q dt 2 l {m)=j} (21) 

dfil {t(n)=i} / dt 2l{((t 2 )=j}i 
Jti 

because the second indicator function vanishes when <t\. Averaging over realisations we find: 

(TiTj)= [ d/l/40,^) / dt2fij(h,t 2 ). (22) 
JO Jti 

This result is illustrated in Fig. [3j). In deriving it we have used the multiplicative rule 



( 1 M'l )=«'} 1 V(t2)=j} ) = fni (°> ? 1 )fij ( f 1 1 f 2 ) ■ 



(23) 
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For i = j, by contrast, we find 



%i = Jo dfll { f (' l)= ^ J dt2l i e ('2)=i} 

= 2 J At ^ l {m=j}j t df 2 1 {<fe)=/}> 



(24) 



which upon averaging yields 



t;)=2^ dfi/n/O.fOjf' dt 2 fjj{h,t 2 ) 



tf)=2 d/i/ n/ (0,fi) / df 2 /, 7 (/i,?2). (25) 



More general correlation functions are readily obtained in terms of multiple integrals over the functions /,„„. Inserting into (fTBI l 
we see that the combinatorial factors (vM) -1 • • • (v„!) _1 cancel to obtain 

n m\ m k -\ ^ ^ 

(J*) = kl £ E ■■■ L m vm k df!/„ mi (0,fi) - - - / dt k f mk _ imk {tk-i,tk)- (26) 

m 1 =2m 2 =2 m*=2 1/0 ■'fc-l 

Eq. d26b provides an explicit expression for the moments of the total branch lengths r„ in populations with smooth population- 
size variations. 

Note that Eq. d26| i expresses the k-th moment of T„ in terms of a 2fc-fold sum (according to < TT~8T > each factor of f„ jmj contains 
a sum over ji). Eq. d26l ) can be further simplified by explicitly performing the sums over mi,..., m k . This results in 



71=2 7t =2 -/0 ./», 



(27) 



The coefficients are determined by recursion: 



; f2n-h 



4;, = Im COTy = (2j-l)(l + (-iy)^i, (28) 

111=2 \ n ) 

h 

dn;ji,...J k = 52 mc nmji^m;j 2 -Jk ■ (^9) 

m=;'2 

For the first moment, an expression corresponding to Eq. (|26T i for the particular case k = 1 was derived bv lSLATKINl(ll996l) . 
Evaluating (j26j for = 1 using ( |27] i and d28l ) we find 

Here |_- • • J denotes taking the integer part. This result is equivalent to Eq. (3) in (lAuSTERLlTZ et at l fl997h . and also to the 
result obtained by summing Eq. (1) in jZlVKOVlC and WieheI|2008|) . For k = 2, the coefficients d n ,-j u j 2 are tabulated in Tab. Q] 
in appendix[B]for small values of n. In general, the nested integrals in Eq. d27l i cannot be simplified further; their form expresses 
the correlations of the times Ty due to population-size variations. 

Finally note that for « = 2, Eq. d26l i can be evaluated to give (TBi i. We show this explicitly because it demonstrates how the 
expression d26j i simplifies when k > n. We have 

(4) = kl [ d/i/ 22 (0,/i) / dl 2 f 22 {h,t 2 )--- f dt k f 22 {t k -i,tk) 

JO Jt { Jtk-l 

= kl f dr, fdt2- r dt k f 22 (0,t k )=k Tdf^-'e-^. (31) 

J0 Jt\ Jt k -i Jo 

This yields Eq. (TBI) . 

We conclude this section by remarking that appendix [A] summarises an alternative approach to calculating F„ (q) and the 
moments (T*), again resulting in Eqs. ( fT2l and (f26b . The approach described in appendix|A]yields a simple recursion, Eq. ( IA17I ). 
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which allows for a convenient calculation of the moments (T*). This result also demonstrates explicitly how the moments (T„ }, 
for a given curve x{t), depend upon the time at which the population is sampled. 

In the following two sections we describe a simple population model subject to population-size variations, and compare results 
of numerical simulations of this model to the analytical results obtained above. 



IV. A MODEL FOR A POPULATION WITH TIME-DEPENDENT CARRYING CAPACITY 

The purpose of this section is to describe a modified Wright-Fisher model with fluctuating population size. This model is 
used in the numerical simulations of sample genealogies described in Sec. [V] Recall the three key assumptions of the Wright- 
Fisher model: (a) constant population-size, (b) discrete, non-overlapping generations, (c) a symmetric multinomial distribution 
of family sizes. We have adopted the following approach: in our simulations, assumptions (b) and (c) are still satisfied, but 
assumption (a) is relaxed. 

We study a large but finite population of fluctuating size N T , where T = 1,2,... labels the discrete, non-overlapping generations 
forward in time. The model we have adopted is the following: consider a generation T consisting of N T individuals. The number 
of individuals in generation T + 1 is then given by 

#t+i = £& (32) 

.7=1 

where the random family sizes £,j are independent and identically distributed random variables having a Poisson distribution 
with parameter X T (specified below). Consequently the number N T+ \ is Poisson distributed with mean N T X T . 

This model exhibits a fluctuating population size N T , rapidly changing from generation to generation. As pointed out in the 
introduction, in large populations such fluctuations are averaged over by the ancestral coalescent process, and can be captured 
in terms of an effective population size. The resulting genealogies are simply described by Kingman's coalescent for a constant 
effective population size of the form (|T). 

Interesting population-size fluctuations occur on larger time scales, corresponding to 'slow' variations of the population size 
over several generations. Such slow changes are most commonly interpreted as consequences of a changing environment. A 
natural model for such changes is to impose a finite carrying capacity K r on the population which varies as a function of T. This 
is the approach adopted in the following, and we choose 

K = ; (33) 

for a certain parameter value r > 0. Here K z+ \ is the carrying capacity in generation T + 1. If the environmental changes affected 
the population through fertility variations, K z+ \ would be replaced by K T in Eq. d33l . Eq. (l33l l is chosen so that the population 
ceases to grow on average when the carrying capacity is reached (A T = 1 foiN T = K T+ \). When the population size is small, the 
population growth follows the logistic law, he = 1 +r(l —N z /K T+ i), where r is the logistic growth rate. The particular form of 
Eq. (l33l ) ensures that A T > . 

Note that fluctuations of N z in this model are due to two different sources: rapid fluctuations are caused by the randomness 
of the family sizes, slow fluctuations are caused by the time dependence of the carrying capacity. Our choice for the time 
dependence of K T is dictated by the following considerations. The aim is to describe the influence of a fluctuating population 
size upon the statistics of genetic variation. To this end we need to consider the functional form of K r . A simple choice for K T 
is a periodically varying function, such as 

K t = K [l + esm(2nV'u)]. (34) 

Note that a more complex dependence of K T upon T can be obtained from superpositions of such functions with different 
amplitudes e and frequencies v. Here we use simply d34l ). and investigate how the statistics of genetic variation in a sample 
depends upon frequency of the fluctuations in K T . 

Fig. |4]shows a realisation of a curve N T obtained in this manner (the choice of parameters is given in the figure caption). The 
figure clearly exhibits fluctuations in N T on two time scales. As pointed out above, we are interested in determining the effect of 
the size variations occurring at long time scales. 

Last but not least we note that conditional on the sequence of population sizes, the genealogy of a set of individuals sampled 
at time x can be determined recursively by randomly choosing ancestors in the preceding generations. This is ensured by 
the assumption that, conditioned on the values of N T and N T+ \, the family sizes follow a symmetric multinomial distribution 
Mn(Af T +i; . . . , The resulting correspondence with the Wright-Fisher rule of reproduction ensures that the genealogies 
can be determined recursively in the way suggested above. 
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FIG. 4 Shows one realisation of the curve N? obtained from simulations of the model described in Sec. IIVI (black solid line). Choice of 
parameters: r = I, Kq = 100, e = 0.9, and KqV = 1. Also shown is an average over the fast fluctuations, Kq[1 + £sin(27TV"r)], red dashed line. 
The upper horizontal axis illustrates where the population is sampled, and how time is counted backwards in the coalescent approximation. No 
denotes the size of the population at the time of sampling. 



V. COMPARISON BETWEEN NUMERICAL SIMULATIONS AND COALESCENT PREDICTIONS 

In this section we discuss the numerically computed distributions shown in Fig. Q] in terms of the results obtained using the 
coalescent approximation. The shapes observed in Fig. Q] are conveniently characterised in terms their mean (7;,), variance, 
skewness, and kurtosis: 



var(r„) = (r„ 2 )-(7; 7 ) 2 , 

((T n -(T n )y) 



skew(r„) = 
kurt(r„) = 



varV2(r„) 

((T n -(T n )Y) 
var2(r„) 



(35) 
(36) 



Recall that for a normal distribution the skewness vanishes, and the kurtosis equals three. We can write the skewness and 
kurtosis in terms of the moments (T n k ) using ({T„ - (T n )) 3 ) = (T 3 ) -3(T,f)(T„) +2(T„) 3 and ((T n - (T„)) 4 ) = (T*) -A{T^){T n ) + 

6(r„ 2 )(r„) 2 -3<r„) 4 . 

As argued in Sec. [IV]and as shown in Fig. |4j our model populations exhibit fast size changes due to the random distribution of 
family sizes. As pointed out in the introduction, these fluctuations are averaged over by the genealogical process and need not be 
considered. The model populations are also subject to slow (and deterministic) size fluctuations given by the time-dependence 
(f34-b of the carrying capacity. Averaging over the fast fluctuations these give rise to a smooth population-size dependence x(t ). 
Given Eq. ( 1341 ), the distribution of T„ depends upon the instance in time when the population is sampled. In the simulations we 
sampled at a particular point (illustrated in Fig. |4]as a dashed vertical line), so that 



x(t) = 1 +£sin(o)f) 



(37) 



Here the frequency is given by co = IkvKq, and time t is now counted backwards, as in Sec. [TTT] If the population were sampled 
at a different time, the distribution p(T„) of T„ (and hence its moments and the corresponding function F„ (q)) would change: the 
distribution depends for example upon whether most recently the population was expanding or declining. The results derived in 
appendix [A] make it possible to determine the corresponding changes to p(T„) in a transparent manner, but we do not discuss 
this issue further here. 

Fig.[5]shows how the mean, variance, skewness, and kurtosis of the distribution of T„ depend on the frequency a> of the pop- 
ulation size variation, Eq. ((37). Shown are results of numerical simulations of the model described in section|IV](symbols), and 
results obtained within the coalescent approximation using Eq. ( IA17b . We observe that the coalescent approximation describes 
the results of the numerical simulations well, even for small population sizes. 

In the numerical simulations we have found that, for very small population sizes, random fluctuations of A^ T around the time- 
dependent carrying capacity K T become increasingly important. Since we suspected that the small deviations observed in Fig. 
[5^ for Kq = 100 were due to such fluctuations, we performed slightly modified simulations imposing a deterministic law upon 
N T by forcing A^ T = K T in every generation (where K T is given by d34l). Comparison of the corresponding results (not shown) 
with Fig. [5ji indicates that the deviations for Kq = 100 at large frequencies are indeed caused by the stochastic fluctuations in 
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FIG. 5 Shows mean (a), variance (b), skewness (c) and kurtosis (d) of the distribution of T n for samples of size n = 10, as a function 
of the frequency of the population-size fluctuations. Shown are results of numerical simulations (10,000 simulations, Kq = 100, triangles; 
Kq = 1,000, diamonds; and A*o = 10,000, circles) as well as results computed within the coalescent approximation described in Sec. [In] red 
solid lines. Black dash-dotted and dashed lines show the approximations for small frequencies, Eqs. 040b and ( 142b . and for large frequencies, 
Eqs. {47} and {48}. The expressions for the limiting behaviours of the skewness and the kurtosis are shown in panels c and d, but are not given 
in the text. The remaining parameter values are r = 1 and e = 0.9, as in Fig.Q] 



the population dynamics underlying Fig.|5ji. A different interpretation of this effect is the following: when the population size 
is very small, and when e is close to unity, the population may exhibit a non-negligible probability of becoming extinct during 
the expected time to the most recent common ancestor for a sample of size n. In this case we have conditioned on the existence 
of the population during 100,Kb generations using rejection sampling. In practice this avoids extinction, but it leads to a biased 
size distribution. 

Consider now the frequency dependence of the moments shown in Fig. [5] It can be qualitatively and quantitatively understood 
using Eq. ( |27| ) together with the following expression for A(f ): 



f ds 

A(f) = 



™ + i _ 1 ^an ( ) + I arctan ' tan(fi » /2)+e 



ol + esin(cos) {(o/2n)\/\ - e 2 



(38) 



We discuss the limits of small and large frequencies (0 separately. In the limit of o — > 0, Eq. ( l38l simplifies to A(f ) ^e(Ot 2 
. Inserting this into ( f30b and approximating 



d f e-^ A M«lfl + ^ S ) (39) 



bj V bj 
we find 

(T„)^2h n +4eco(g„ -h n /n). (40) 
Here h n = H}Z{r l and g n = L"Z\r 2 - Eq. go} is shown in Fig. |5j» as a dash-dotted line. To compute the variance we 
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J0 Jh 



approximate 

1 bj, + 2b h 

T^-^^pf. (41) 

and find an approximate expression for (r„ 2 ) which results in the following expression for the variance: 

var(r„)«4^ + 16eo)(/„-g n + ^^) (42) 

with /„ = \ (tn~ 3 +tn~ 2 h m j r \). The limiting value for zero frequency is that of the standard coalescent with constant popula- 
tion size x=l. Eq. d42l is shown in Fig. 0) as a dash-dotted line. Similarly the standard results for the constant-size coalescent 
are obtained for the skewness and for the kurtosis in the limit of co — > 0. This limiting behaviour is illustrated in Fig. QJi which 
shows that the distribution of T„ approaches that for Kingman's coalescent for a constant population size x = 1 in the limit of 
small frequencies. We note that for to <C 1, the population-size dependence is essentially that of a declining population, because 
the time to the most recent common ancestor is reached before the first maximum in x(t) going backwards in time (see Fig. [4] 
and Eq. 071 

Of particular interest is the limit of large frequencies, as we now show. As the frequency tends to infinity, one expects that the 
coalescent process averages over the population-size oscillations, and the standard coalescent process with a constant effective 
population size should be obtained. For large but finite frequencies, by contrast, Fig. |5ji exhibits deviations from the standard 
coalescent behaviour. In the following we analyse the behaviour of the moments in this regime. In the limit of large frequencies, 
Eq. ( f38l > simplifies to 



arctan 



A(f) = —= V £ ~ \-O((0 2 ) + oscillatory terms. (43) 

Vl-e 2 covl-e 2 

For large frequencies, the function A(f ) is well approximated by a shifted linear function 

A(f)«f/x e ff + A . (44) 

Here 

i r T dt 



x eff = lim - / — — - = (45) 

\t^°°T Jo l+esin(cof)y 

is the effective population size according to Eq. (fTJ, it describes the influence of the demographic fluctuations upon the part of 
the genealogy in the far past. The small offset 



— arctan(e/v1 - e 2 ) e . 

Ao = ?s for e not too close to unity (46) 

XeffCO X e ff(B 

describes the influence of demographic changes on the most recent part of the genealogy. Inserting the approximation d44b into 
d27l i we find for large frequencies (and when the amplitude e is not too close to unity): 

(T n ) « 2x en h n + — . (47) 

CO 

The first term in (07J is the expected time of Kingman's coalescent for a constant effective population size x e ff. The curve 
corresponding to d47l ) is shown as a dashed line in Fig. [5^. 

We now discuss the behaviour of the variance shown in Fig. 0). For the second moment we find: 

,i, i , i % 4nh„x e fr£ 

(T n 2 ) « 4x 2 ff (g„ + /i 2 )+ ^ eff , (48) 

The first term in Eq. ( |48l > corresponds to the second moment of T„ in Kingman's coalescent with a constant effective population 
size x e g. The second term in d48l l represents a correction due to finite but large frequencies, it depends in a simple fashion on the 
effective population size x e s and on the sample size n. 

Comparing Eqs. d4Tb and (l48b we arrive at the conclusion that the corresponding correction for the variance var(r„) vanishes. 
This is consistent with the fact that, at large frequencies, the variance of T n is surprisingly insensitive to changes in frequency 
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(as opposed to the behaviour of (T n ), see Fig. [5^ and b). In fact, the limiting value (shown in Fig. 0} as a dashed line) is a very 
good approximation to var(r„) down to a> w 3. 

Consider now the skewness and the kurtosis shown in Figs. [5£ and d. Their behaviour is similar to that of the variance: over 
a substantial range, the skewness and the kurtosis are essentially independent of co. The results shown in Fig. [5]imply that over 
a large range of frequencies, the distribution of the total branch lengths T„ can be approximated as follows: the distribution is 
essentially that of the standard Kingman coalescent with an effective population size x e g, but the distribution is shifted such that 
its mean is given by Eq. d47| i, rather than by 2x e ffh n . 

One may wonder when this 'rigid shift' occurs. Given Eq. ( 126*1 ) it is straightforward to work out the fluctuations of the times Ty 
within the approximation (|44T >. We find that for j < n, the expected value of Tj is exactly that of the standard Kingman coalescent 
with effective population size x e g. But for j = n it is rigidly shifted by — x e ffAo. This indicates that the genealogies are essentially 
those of the standard coalescent, but modified by an initial rigid shift. In the parameter regime discussed here, the distribution 
of times is expected to be well approximated by a two-parameter family of distributions: 



P(T„ < z) : 



exp 



(49) 



when z/x e ff > — «Ao, and P(T„ <z)«0 for smaller values of z. The first parameter is the effective population size x e ff which 
determines the slope of the function A(f ) at large times and describes the demographic effect on the far past of the genealogy. 
The second parameter, Ao describes the influence of the demographic fluctuations on the initial part of the sample genealogy. 
This parameter can be negative (initial population expansion, this is the case shown in Fig. [5]) or positive (initial population 
decline). When Ao > 0, the distribution p(T„) is rigidly shifted to the left. In this case the approximation d44l) is expected to 
break down when the body of the distribution reaches T„ = 0. 

Note that the distribution d49l cannot be described by a single parameter (a 'generalised effective population size'). The 
approximation d49l was used to generate the red dashed curves in Fig. [TJl and e. 



VI. DISCUSSION AND CONCLUSIONS 

The aim of this paper was to investigate how the frequency of smooth population-size fluctuations determines the shape of the 
distribution of total branch lengths of sample genealogies, and thus of statistical measures of genetic variation. 

We have performed simulations for a modified Wright-Fisher model of a population subject to a time-periodically varying 
carrying capacity and have determined the distribution of the total branch lengths, shown in Fig. [JJ We have characterised 
how the shapes of the distributions depend upon the frequency of the population size fluctuations by computing the frequency 
dependence of the moments of these distributions. We could explain these dependencies in terms of coalescent approximations. 
In particular, we derived a general expression - Eq. ( |27] i - for the moments (7^) in populations subject to smooth population 
changes of otherwise arbitrary form. 

Our results show how quickly (or slowly) the standard coalescent result for a constant (effective) population sizes is recovered 
in the limits of large and small frequencies. More importantly, our coalescent results allow to determine how significant devia- 
tions are at large but finite frequencies. In this case we have argued that at large frequencies, the distribution of T„ is essentially 
that of the standard Kingman coalescent with an effective population size x e g, but with a shifted mean value 

«-! i ,, e 

(T„)=2x eS y - + — . (50) 
PiJ «> 

The first term on the rhs corresponds to the result of the standard Kingman coalescent with constant effective population size 
x e ff. The second term on the rhs is the correction term resulting from the population-size variations (£ is the amplitude of the 
population-size oscillations, ft) its frequency, and n is the sample size). Last but not least we have found that the coalescent 
approximation yields a reliable description of the numerical data, even for very small populations. 

These results enable us to determine how the distribution of the number S n of mutations (single-nucleotide polymorphisms) in 
a sample of size n depends upon the frequency and on the amplitude of population-size fluctuations: Eq. (0| allows to compute 
moments of S„ from Eq. (|27] |. In this way we have determined the mean, variance, skewness, and the kurtosis of the distribution 
of S n . The results are shown in Figs. [6]and[7] As expected, the moments of S„ approach those of 0T,,/2 as the scaled mutation 
parameter 9 increases. This can be verified by comparing the red curves (corresponding to e = 0.9) in Fig.[7]to the red curves 
in Fig. [5] The higher moments converge more slowly than the mean and the variance. In conclusion, Figs. [6]and[7]demonstrate 
that the distribution of the number S„ of mutations in samples of size n depends in a complex manner on the amplitude and on 
the frequency of the population-size variations, and on the mutation parameter 6. 

We close with a number of remarks. First, Eq. ( f2Tb is easily generalised to describe the moments of observables which are 
polynomial functions of the times Ty (see Fig. [3*^ for a definition of these times). Particularly simple is the case of observables 
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FIG. 6 Shows mean (a), variance (b), skewness (c) and kurtosis (d) of the distribution of S„ for samples of size n = 10 and scaled mutation 
parameter 6 = 1, as a function of the frequency of the population-size fluctuations, for three values of e: £ = 0.5 (dashed-dotted green lines), 
e = 0.9 (solid red lines), and £ = 0.99 (dashed blue lines). The curves were obtained by iteration of Eq. JA17t in combination with d38b . 



A that are linear functions of the times Ty, A„ = Yfj=2 a j T j- ^ n tn i s case the k-th moment of A n is given by Eq. d27b , but with 
modified coefficients: the factors m in Eqs. (l28l l and d29ll are replaced by a m . 

Second, some observables (such as the F-statistic dFu and Llill993h ) can be written as linear functions of Ty, but with random 
coefficients. In this case too it is possible t o explicitly compute th e moments of the distribution of the observable. These two 
questions are addressed in a separate paper dSAGlTOV et a/.ll2010h . 

Third, a result derived in appendixfA] Eq. dA17K allows us to determine in a transparent fashion how the fluctuations of T n and 
other observables depend upon the time at which the population is sampled. This will make it possible to discuss for example 
how Tajima's D-statistic or the F-statistic depend upon the time of sampling after a bottleneck, a population expansion, or a 
decline. 

Fourth, population-size fluctuations are sampled non-uniformly by the genealogies: initial coalescent events occur at faster 
rates and are thus more sensitive to recent size fluctuations. Remote coalescent events, by contrast, occur at slower rates thus 
damping the effect of size fluctuations in the far past. We therefore expect significant deviations from the standard coalescent 
behaviour arising from the most recent history for large sample sizes n. It would be interesting to quantify this expectation by 
computing the covariances and higher moments of the times Xj during which the sample genealogy has j lines: first for large 
; w n and j w n we expect to observe strong correlations (t,t,) — (t ( )(t ; ) and thus deviations from the coalescent. Second for 
small values of i and j we expect the times T; and Xj to de-correlate and to follow the distribution of the standard coalescent 
(with an effective population size). 

Fifth, the model introduced in Sec. |IV]assumes a carrying capacity that varies sinusoidally, with a single frequency. It turns out, 
however, that our findings are valid for arbitrary time-dependent fluctuations with sufficiently strong modes at small frequencies. 
Examples are linear combinations of high-frequency oscillations, or stochastic fluctuations around a constant population size 
with sufficiently short auto-correlation time. In this more general case, too, we expect that A(f ) is well approximated by d44l i. If 
this is the case, the distribution of times is of the form d49l when Ao is small. 

Taken together, the results derived in this paper give a rather complete understanding of the fluctuations of empirical observ- 
ables due to smooth population size variations. These results will be significant when attempting to disentangle the effects of 
population-size variations from other factors influencing genetic variation. 

Our results raise the question under which circumstances the deviations from standard coalescent behaviours due to 
population-size fluctuations (Figs. Q] [5] |6] and |7]i are most likely to strongly affect the interpretation of empirical data. As 
our analysis indicates, the deviations become substantial when the frequency co = 2kvKq is of the order of or less than the 
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inverse expected time between coalescent events in the sample. Here v is the frequency of the population size variations, Eq. 
d37l l. and Kq is a suitable measure of the population size (the arithmetically averaged carrying capacity in our example). In other 
words, rapid population-size fluctuations will have the strongest effect (other than simply determining the effective population 
size, Eq. (Q])) in small local sub-populations with restricted gene flow between sub-populations with different fluctuations. The 
deviations are expect to be smaller at larger spatial scales, because the ancestral process averages over the spatial fluctuations. 
More generally, we conclude that deviations from standard coalescent behaviour are expected for populations subject to an en- 
vironment which smoothly changes as a function of space and time. An example for such a population is the marine snail L. 
saxatilis. Its habitat on the Northern coast of Bohuslan (Sweden) is fragmented into sub-populations with strongly restricte d 
gene flow between them, effective population sizes of sub populations have been found to be very small (|jOHANNESSONll2009l) . 
Starting from the results derived in this paper, we hope to determine gene genealogies in such fragmented populations subject to 
smooth variations of population size in space and time. 
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FIG. 8 a Illustrates the definition of F„(q,t) which for q = 9/2 is the probability that n sequences sampled at time t are identical. The 
corresponding branches in the tree are drawn as solid lines, and n = 3 in the figure, b Shows schematically how the population size x(t) 
depends upon time. 



APPENDIX A: Alternative calculation of F„(q) and (r*) 

In this appendix we demonstrate and alternative way of calculating function F„ (q) and the moments (T„) within the coalescent 
approximation. Given a realisation of the curve x(t), the function F„(q) can be calculated as follows. 

Consider the function F„(q,t) that is, for q = 9/2, the probability that n sequences sampled at time t are identical. The time 
argument describes how F„ depends upon the time at which the population is sampled, given a smooth population-size curve 
x(t). The definition of F n (q,t) is illustrated for the case n = 9 in Fig. [8] In Sees. ||ll]and[Vl the populations were sampled at t = 0, 
which corresponds to the choice F n (q) = F n (q,Q). The more general quantity F n (q,t) allows to determine how the fluctuations 
of sample genealogies depend upon the time of sampling (in a population of constant size, F n is independent of t ). 

To obtain a recursion for F„(q,t) in a population of fluctuating size, take q = 9/2 and consider a small time interval 8t. A 
change in F„ during this time interval is due to either a mutation in one of the ancestral lines, or to two ancestral lines having a 
common ancestor. Thus, to first order in 8t 



F n (q,t) = 

Taking the limit 8t — > 0, we obtain: 



n(n-l) 
1 -nqOt , . Ot 



2x(t) 



F n (q,t + St) + ; , / StF n -i{q,t + 8t). 



jF n {q,t) = 



nq 



n(n — 1) 



2x{t) 



2x(t) 



n(n — I) 
F n (q,t)- V ; F„_i(g,/). 



(Al) 



(A2) 



The recursion is terminated by F\(q : t) = 1 for all values of t. In a population of constant size x = 1, F„(q,t) does not depend 
upon t and the result © is immediately recovered from (IA2t . To find the general solution, Eq. dA2b is rewritten as follows: 



F n (q,t)=b n r^e^-HMAM-AM]^^). 
Jt x(s) 



It is convenient to consider the function G„(q,t) = e nqt b " A ^F n (q,t). It obeys the recursion 



G n (q,t)=b n I Ae-^-^Gn-ife.jJ^^C-OW. 
x(s) 



(A3) 



(A4) 
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Here Jzf^ is the operator 



mf)(t) = h r^e-^ A ^f( S ) , (A5) 
Jt x(s) 



and <p(s) = exp(-qs). In terms of this operator, the recursion is simply solved by repeated action of JC^ on the function 0: 

G n (q,t) = (JSf„0JSf„_i ■ ■ ■ 0^20 2 )(?) , (A6) 

where we have used the fact that G\ (q,s) = exp(-qs). Noting that G„(q,0) = F n (q,0) = F„(q) we obtain the desired expression 
for F n (q): 

F n (q) = (J^J^-i ■ ■ -0^20 2 )(O) . (A7) 

The moments (T k ) can be calculated in a similar fashion. Consider the function (T*(t)) describing the moments of the total 
length of all solid branches shown in Fig. [8] Then (7^) = (7^(0)), and (T„(t)) obeys the recursion: 



j t (T n k (t)) = -kn(Tt\t)) + J*L [(T n k (t)) {Tl.it)) 



(A8) 



We introduce the function hf\t) = e - fo " A « (T n k (t)}. With the initial conditions h ( °\t) = e -^ A M for n > 2 and h\"\t) = for 
m > 1, the recursion dA8b is simply: 



h%Xt)=kY j m{& n & n ^---& m+l J?h£ l) )(t) (A9) 

m—2 

where J? is the integral operator J t °° dt ' . 

Now we show that the action of the chain Jz?„Jz^,_i • • • J?J n +i of operators on an arbitrary function / can be represented in 
terms of a single integral. To show this, it is convenient to make a change of variables to z = A(f ): 

(JS?t/)(z) = b k J~dyerW>f(y) . (A10) 

The task is to seek a kernel K nm (z,z') such that for any function / 

(jSf„JSf„-i • • -Xnf)(z) = J_°°dz'K nm (z,z')f(z') . (All) 

The kernel must satisfy 

K nm (z,z') = b n l Aye-^yK n ^, n {y,z r ) (A12) 

Together with the initial condition K mm (z,z') = b m exp[—(m— \)z']H(z' —z), this recursion allows to compute the kernel in 
closed form. This can for example be achieved by considering the Laplace transform of ( lAlOl i. We find: 

K nm (z,z') = t knnje-^-fiK^-W (A13) 

j=m 

= lV _ m 2;-i r(m+j-i) r(«)r(« + i) 

" mj 1 ' 2 r{m)r{m-\)T{j-m+l)T(n + j)T{n-j + i)' 
This kernel can be used to evaluate dA9b . For any function g(t) we have that 

■J? m+l J?g)(t) = j"dt'A nm+1 (A(/),A(/'))s(0 (A14) 
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with 

r' 

A„ m+ i(z,z') = J dz"K„ m+l (z,z") (A15) 
(and A nn+ \(z,z') = 1). Inserting this result into ( IA9t yields 

n i>oo 

h ( n\t) =k^m At'A nm+x (A(O,A(0) h { t l \t') . (A16) 



= 2 



Identifying A nm+1 (z,z') = exp(-fe„z)g„ m (z'-z)exp(fc m z') we find 



(r*(t)>=*J>/ dt>f nm (t,t>)(TJt\t>)}. (A17) 

m=2 7 ' 



This recursion yields Eq. 
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APPENDIX B: Coefficients d„. juh for n = 2, . . . , 10 



In Tab. Uwe give the coefficients d n -uj 2 determining the second moment (r„ 2 ) according to Eq. (|27T i for n = 2, . . . , 10. Note 
that the coefficient for n = 2 is consistent with Eq. dl41 l. 
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TABLE I Shows coefficients d, r j t j 2 occurring in Eq. ((27} for n = 2, . . . , 10. Coefficients for odd values of ji vanish. 



